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Bacteria colonizing the human intestinal tract exhibit a high phylogenetic diversity that 
reflects their immense metabolic potentials. The catalytic activity of gut microbes has 
an important impact on gastrointestinal (Gl) functions and host health. The microbial 
conversion of carbohydrates and other food components leads to the formation of a large 
number of compounds that affect the host metabolome and have beneficial or adverse 
effects on human health. Metabolomics is a metabolic-biology system approach focused 
on the metabolic responses understanding of living systems to physio-pathological 
stimuli by using multivariate statistical data on human body fluids obtained by different 
instrumental techniques. A metabolomic approach based on an analytical platform could 
be able to separate, detect, characterize and quantify a wide range of metabolites 
and its metabolic pathways. This approach has been recently applied to study the 
metabolic changes triggered in the gut microbiota by specific diet components and diet 
variations, specific diseases, probiotic and synbiotic food intake. This review describes 
the metabolomic data obtained by analyzing human fluids by using different techniques 
and particularly Gas Chromatography Mass Spectrometry Solid-phase Micro Extraction 
(GC-MS/SPME), Proton Nuclear Magnetic Resonance ( 1 H-NMR) Spectroscopy and Fourier 
Transform Infrared (FTIR) Spectroscopy. This instrumental approach has a good potential 
in the identification and detection of specific food intake and diseases biomarkers. 
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INTRODUCTION 

The human intestine is home of some 100 trillion of microorgan- 
isms of at least hundreds species. The density of bacterial cells in 
the colon has been estimated at 10 11 — 10 12 per ml, which makes 
it one of the most densely populated known microbial habitats 
(Eckburg et al., 2005; Ley et al., 2006; Vitali et al., 2010). This 
microbial ecosystem serves numerous important functions for 
the human host, including protection against pathogens, nutrient 
processing, stimulation of angiogenesis, modulation of intesti- 
nal immune response, and regulation of host fat storage (Palmer 
et al., 2007). The composition of the adult gastrointestinal (GI) 
microbiota has been intensely studied, using both classical micro- 
biology cultivation and, more recently, culture-independent, 
small subunit (SSU) ribosomal DNA (rDNA) sequence-based 
methods (Palmer et al, 2006, 2007; Huse et al, 2008; Vitali et al, 
2010). 

The prevention and treatment of gut-related diseases will 
strongly depend on understanding the mechanisms involved in 
the complex processes of digestion. However, the interactions 
between the diet, microbiota and host are largely unknown 
(Quigley, 2011; Del Chierico et al., 2012; Nicholson et al, 2012; 
Payne etal.,2012). 



This review is addressed to evaluate the potential of differ- 
ent analytical approaches to identify and detect the molecules 
present in body fluids, such as blood, urine, feces, which can be 
considered originated by food digestion, diet/gut microbiota, and 
host/microbiota interactions as well as disease markers. 

DIET EFFECT ON HUMAN GUT METABOLOME 

Foods contain thousands of compounds which, upon digestion 
and metabolism, give rise to complex physiological reactions 
and possible changes in the intestinal microbiota profile result- 
ing in a plethora of metabolites present in body fluids such 
as blood, urine, feces, and saliva. The number and diversity 
of chemical compounds in foods is amazing. It has been esti- 
mated that an omnivore diet exposes humans to more than 
15,000 components, 8000 out of which are non-nutrients such 
as dietary fiber, antioxidants, prebiotics, and probiotics (Wishart, 
2008). 

An individual food system such as milk contains more than 
200 different oligosaccharides, and the edible plants metabolome 
consist of more than 10,000 detectable compounds and 800 non- 
nutrient phytochemicals (Wishart, 2008). These molecules and 
their accumulation in body fluids, and in particular in urine, can 
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be regarded as a repository of metabolites in which any nutrient 
or non-nutrient that is not needed or present in excess is found. 
Several food-specific biomarkers, e.g., related to wine, black tea, 
coffee, fruit, and vegetables, have been identified in urine and 
blood samples (Spencer et al., 2008; Wishart, 2008). Other com- 
pounds occurring in different body fluids, namely blood, can 
be regarded as biomarkers of physiological response to foods. 
In particular, they are products of lipid peroxidation such as 
8-isoprostaglandin F2alpha, and/or oxidative stress markers such 
as malonaldehyde or glutathione. 

Some of the diet components have a potentially direct quan- 
titative and qualitative impact on the microbial species charac- 
terizing the gut microbiota. Their metabolic output can harbor 
physiologically active compounds for the human host which in 
turn provides a stable environment for proliferation. On the 
other hand the host has evolved and can use bacterial fer- 
mentation products as an energy source for the epithelial cells. 
Some of the fermentation products, e.g., short chain fatty acids 
(SCFAs), can positively affect the biochemical and physiological 
processes at colon level and support some important biolog- 
ical functions (Wong et al., 2006). However, the influence of 
the bacterial population composition on inter-individual differ- 
ences in metabolites production and colon health status is poorly 
understood, except for the well-known relationships between 
some specific metabolites and metabolic diseases (Le Gall et al., 
2011). 

Metabolic profiling has a wide potential for understanding the 
complex interactions between components of the gut microbiota, 
and elucidate the cause/effects relationships associated with spe- 
cific nutritional choices and the related shifts in the microbiota 
composition. 

Particularly challenging is the metabolic signatures identifi- 
cation of many phenotypes and their linkage with nutritional 
choices. In fact, the whole set of metabolites which can be detected 
in body fluids, and which characterizes the metabolic phenotype, 
named "metabotype" (Waldram et al., 2009), of an individual, is 
affected by various intrinsic and extrinsic factors including envi- 
ronment, drugs, diets, lifestyle, and genetics. It should be also 
considered that the content of non-nutrients in diets is higher 
than that of nutrients. Such non-nutrients can exert a signif- 
icant effect on metabolomic profiles thus resulting in several 
compounds that can be used as biomarkers to trace the food 
origin. 

H0ST-MICR0BI0ME METABOLIC INTERACTIONS AND 
CELL-CELL COMMUNICATION 

Mammalian-microbial symbiosis can play a role in the eti- 
ology and development of several diseases, e.g., insulin resis- 
tance, Crohn's disease (Marchesi et al, 2007; Kinross et al., 
2011), irritable bowel syndrome (IBS) (Martin et al, 2006; 
Sartor, 2008), food allergies, gastritis and peptic ulcers, obe- 
sity, cardiovascular disease, and GI cancers (Martin et al., 2011). 
Activities of gut microbiota can be highly specific, and it has 
been reported that the establishment of Bifidobacteria is impor- 
tant for the development of the immune system and manage- 
ment of gut functions (Ouwehand, 2007). As the microbiome 
strongly interacts with the host to determine the metabotype, 



which influences outcomes of drug interventions, the knowl- 
edge of these interactions can provide personalized healthcare 
solutions (Martin et al., 2011). Also the diet has a key role in 
the gut microbiota modulation and shaping and, as a conse- 
quence, the different foods or their ingredients play a crucial role 
in the microbes selection and in a metabolic signaling network 
construction. 

The host and its gut microbiota coproduce a large array of 
small molecules during the conversion of food and xenobiotics 
(compounds of non-host origin that enter the gut with the diet or 
are produced by microbiota), many of which play critical roles in 
shuttling information between host cells and the host's microbial 
symbionts. This chemical dialog includes signaling via low molec- 
ular weight metabolites, peptides, and proteins or may take place 
indirectly through immune-mediated pathways (Nicholson et al., 
2012). The metabolites production by microbes may influence 
host health status and their detection in different body fluids can 
be considered as dysbiosis or disease biomarkers (Holmes et al., 
2011; Nicholson et al., 2012) (Figure 1). 

Different regions of the human GI tract vary in terms of com- 
position of the indigenous microbiota (Gordon, 2012). For each 
compartment of the GI tract, a chemical dialog exists among dif- 
ferent microbial species, e.g., direct substrate provision in the 
microbial food web, quorum sensing, contact-dependent signal- 
ing, and potentially gastro- transmitters (Ryan et al., 2009) as well 
as between microbial symbionts and host cells (Li et al., 2008). 

Metabolites are the products and by-products of the many 
intricate biosynthetic and catabolic pathways existing in all liv- 
ing systems. Biological systems that exist at steady state maintain 
approximately constant concentrations of metabolic intermedi- 
ates (Martin et al., 201 1). 

Fecal extracts are the most common material which can help 
to decipher the complex metabolic interplay between mammals 
and their intestinal ecosystems, and metabolic profiles-related to 
SCFAs, amino acids, organic acids, nucleosides and nucleotides, 
polyamines, phenolic compounds, bile acids, and lipid compo- 
nents can yield information on a range of gut diseases (Martin 
et al, 2011). 

As described by Grider and Piland (2007), SCFAs stimu- 
late intestinal transit, and at physiological concentrations they 
induce an 8-10-fold increase in serotonin release in an in vitro 
colonic mucosal system. Moreover, Musso et al. (2011) showed 
that SCFAs are clearly one of the most important gut microbiota 
products and affect a range of host processes including energy uti- 
lization, host-microbe signaling, and control of colonic pH with 
consequent effects on microbiota composition, gut motility, and 
epithelial cells proliferation. 

The monitoring of the fecal metabolome also may unravel 
diagnostic information for inflammatory bowel diseases 
(IBDs), including Crohn's disease and ulcerative colitis (UC), 
Hirschsprung's disease, celiac disease, allergy etc (Martin et al., 
2011). Metabolomics has many potential applications. It supports 
functional genomics studies, systems biology, pharmacology, 
toxicology, and nutrigenomics. An approach called discovery 
metabolite profiling (DMP) is used to analyze all metabolites 
generated by a particular enzyme, providing a link between 
proteome and metabolome. 
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FIGURE 1 | Schematic representation of diet, microbes, and host 
interaction at gut level. The chemical dialogue via low molecular weight 
metabolites, peptides, and proteins between cell-cell and host-microbes 
leads to the metabolite production in different body fluids which could be 
considered as disease biomarkers. 



ANALYTICAL TECHNIQUES TO DETECT METABOLITES 

Metabonomics represents a well-recognized metabolic system 
approach that involves the study of multivariate metabolic 
responses of complex cellular organisms to different stimuli 
(Nicholson etal, 1999). 

Ideally, it could be based on this sequence: samples col- 
lection (e.g., urine, blood, or some other body fluids), scan- 
ning them in a machine and finding a profile of tens or 
hundreds of chemicals that can predict whether an indi- 
vidual is on the road to a disease, or likely to experience 
side-effects from a particular drug (Pearson, 2007). However, 
this vision presents some drawbacks. The first complication 



is that one person's profile of metabolites is likely to be 
dramatically different from another's, and each may fluctu- 
ate markedly depending on different aspects, for example the 
lifestyle, the diseases, the diet, the nutrition etc (Pearson, 
2007). 

The detection and identification of hundreds metabolites can 
offer deep insights on the influence of lifestyle and dietary fac- 
tors in relation to specific diseases. The recent rapid development 
of a range of analytical platforms, including gas chromatog- 
raphy (GC), liquid chromatography (LC), high pressure LC 
(HPLC), ultra pressure LC (UPLC) coupled to mass spec- 
trometry (MS), capillary electrophoresis (CE) coupled to MS, 
Fourier Transform Infrared (FTIR) spectroscopy, and nuclear 
magnetic resonance (NMR) [e.g., proton ^HJ-NMR] spec- 
troscopy, can enable to separate, detect, characterize, and quan- 
tify such metabolites and related metabolic pathways (Zhang 
et al., 2011). Metabolomics focuses on the complex interac- 
tions of system components and highlights the whole system 
rather than the individual parts, providing a distinct perspec- 
tive on cellular homeostasis (Liu et al., 2010). Even if nowa- 
days NMR, GC-MS, LC-MS are the prevalent techniques used, 
none of them is a perfect technique that can meet the require- 
ments of metabolomics for measuring all metabolites. MS-based 
metabolomics offers high selectivity and sensitivity for the identi- 
fication and quantification of metabolites, and its combination 
with advanced and high-throughput separation techniques can 
reduce the complexity of metabolite separation (Zhang et al., 
2011) (Table 1). 

NMR-based metabolomics is able to provide a "holistic view" 
of the metabolites under certain conditions, and thus is well- 
suited and advantageous for metabolomic studies (Wu et al., 
2010). In particular, it is becoming a useful tool in the study of 
body fluids and for a non-invasive detection of metabolites and 
into diagnosis of the diet effects on gut microbiota or significant 
public health problems (Martin et al., 2012). 

Usually, the measurement of the gut microbial metabolism 
is confined to fecal samples, which is typically limited 
because of the elevated colonic absorption of bacterial 
metabolites. The Proton Nuclear Magnetic Resonance ('H- 
NMR) monitoring of the human fecal metabolome also 
may unravel diagnostic information for IBD including 
UC (Marchesi et al, 2007). Saric et al. (2008) evaluated 
the similarity and dissimilarity across different mammals 
namely humans, mice, and rats by using 1 H-NMR analy- 
sis. The authors reported how human fecal extracts showed 
greater inter-individual variation than in rodents, reflecting 
the natural genetic and environmental diversity in human 
population. 

A major source of intestinal metabolites is both host and 
microbial processing of dietary nutrients. As reported by Martin 
et al. (2010) the 1 H-NMR analysis of feces revealed that the 
supplementation with probiotics significantly affects the host 
microbiota interaction. Probiotic supplementation of "human- 
ized" mice, inoculated with a model of human baby microbiota, 
was associated with metabolic changes in the protein metabolism 
of Lactobacillus paracasei in particular with specific aminoacidic 
pattern. 
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Table 1 | Common analytical techniques used in metabolomics. 



Analytical Advantages Disadvantages Comments 

method 



NMR 




nd|JIU dlldlyblcj 




1_UW bcllblLIVILy 




OIKMIIIt.ul UUI IblUci a HOI 1 . yivcb UcldllcU bUUUULUId 














information, particularly using 2-D-NMR of isolated 














metabolites 




• 


High resolution 


• 


Convoluted spectra 


• 


Chemical bias: these methods have little chemical 














bias and can be used directly on the sample 




• 


No derivatization method 


• 


More than one peak per 


• 


Speed: few minutes to hours. Depends on the 










component 




strength of the magnet, sensitivity can be 














improved by magic angle spinning 




• 


Non-destructive 


• 


Libraries of limited use due to 














complex matrix 






GC-MS 




"sonciti\/o 
OBI lol LI v c 




Slow 




f^hoinifal prniciHoratinn 1 nn itc n\A/n \a/iII nnt 

V* 1 ICI 1 III* Cll LUIIolUCldllUII. Ul 1 1 Lo UVVII Will IIUL 














generally lead to metabolite identification. 














However, coupled with MS and NMR is very 














powerful foranalyte identification 




• 


Robust 


• 


Often requires derivaization 


• 


Chemical bias: solvent extraction bias: non-polar 














vs. polar analytes. Need for chemical derivatization 




• 


Large linear range 


• 


Many analytes thermally- 


• 


Speed: very useful for separation, but typically take 










unstable or too large for analysis 




10-30 min 




• 


Large commercial and public 














libraries 










LC-MS 


• 


No derivatization required 


• 


Slow 


• 


Chemical consideration: on its own will not 






(usually) 








generally lead to metabolite identification. 














However, coupled with MS and NMR is very 














powerful foranalyte identification 




• 


Many modes of separation 


• 


Limited commercial libraries 


• 


Chemical bias: solvent bias means it is usually 






available 








more applicable to polar compounds 












• 


Speed: very useful for separation, but typically take 














10-30 min 




• 


Large sample capacity 










FT-IR 


• 


Rapid analysis 


• 


Extremely convoluted spectra 


• 


Chemical consideration: provide limited structural 














information, but useful for identification of 














functional groups 




• 


Complete fingerprint of sample 


• 


More than one peak per 


• 


Chemical bias: these methods have little chemical 






chemical composition 




component 




bias and can be used directly on the sample 








• 


Metabolite identification nearly 


• 


Speed: 10-60s 










impossible 








• 


No derivatization needed 


• 


Requires samples drying 







Dumas et al. (2006) have recently applied 'H-NMR technique 
to characterize the intergenome interactions in mice with syn- 
biotic gut microbiota as well as to monitor the gut-microbial 
metabolite variation in rats and to study the intricate relation- 
ships between gut microbiota and host co-metabotype associated 
with dietary-induced changes. Ndagijimana et al. (2009) studied, 
by means of the 1 H-NMR analysis of healthy human subjects 
faeces, the effect of the supplementation with a synbiotic food 
based on Lactobacillus acidophilus, Bifidobactrium longum, and 
Fructooligosaccharides. These authors reported that the num- 
ber and the extent of metabolites in faecal slurries were strongly 
affected by the synbiotic food consumption and gave rise to 
characteristic metabolic signatures. Reproducibility of 'H-NMR 
metabolic profiles generated from water and methanol extracts 



from human stools were assessed by Jacobs et al. (2008). On the 
other hand 1 H-NMR is one of the preferred platforms also for 
urine and plasma analysis (Ala-Korpela, 2008). The first pub- 
lished study, in which a metabolomics approach on urine was 
described, used the 1 H-NMR technique to monitor the effect 
of the inclusion of soy in the diet (Solanky et al., 2003). Urine 
samples have also been used to investigate responses to ingestion 
of chamomile tea, or other foods such as coffee, wine, and tea, evi- 
dencing that hippurate and glycine are important discriminatory 
metabolites (Ito et al., 2005; Wang et al., 2005). 

The ability to predict the occurrence of exercise-induced 
ischemia in patients with suspected cardiovascular disease was 
investigated by 1 H-NMR blood analysis. Barba et al. (2008) 
demonstrated that lactate, glucose, lipids, and long-CFAs are 
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the main metabolites involved. Xanthine and ascorbate were 
proposed as possible markers of plaque formation in an 
artherosclerotic mouse model (Leo and Darrow, 2009) and 
lipoprotein subclasses can now be analyzed by a commercial 
: H-NMR -based protocol (Jeyarajah et al., 2006; Ala-Korpela, 
2007). 

Even if the information provided using : H-NMR is highly 
valuable, it is still limited due to the low resolution and sensitivity 
which enables the annotation and quantification of only a limited 
number of low molecular weight molecules (Jansson et al., 2009). 

GC-MS-based metabolomics requires a high-throughput tech- 
nology to handle a large volume of samples and accurate peak 
identification through the standard retention times and mass 
spectra. GC-MS has been widely used for metabolomics and 
can provide efficient and reproducible analysis (Zhang et al, 

2011) . In fact, it is also possible to obtain, simultaneously pro- 
files of several hundred compounds including organic acids, 
most amino acids, sugars, sugar alcohols, aromatic amines, and 
FAs. GC-MS Solid-Phase Microextraction (SPME) based anal- 
ysis can be considered as a very effective method for rapidly 
qualitatively and quantitatively analyze faecal samples most of 
which have not been previously reported (Garner et al., 2007). 
In particular, GC-MS/SPME represents a novel method to study 
metabolic profiles of biological samples. This approach has been 
used to compare neonates and adult feces (De Lacy Costello 
et al., 2008) and to identify volatile markers of GI diseases 
(Garner et al., 2007). The main metabolites identified by GC- 
MS/SPME belong to sulphur compounds, nitrogen compounds 
such as pyridine, and its derivatives, pyrazines, indoles, alde- 
hydes, ketones, esters, alcohols, phenols, organic acids, hydro- 
carbons, and stress molecules such as furans and furanones. 
Vitali et al. (2010) investigated the impact of a synbiotic food 
on a human gut microbial ecology and metabolic profiles by 
using this technique. While no significant changes in the struc- 
ture of the gut microbiota of healthy subjects was observed, 
the synbiotic food intake generated significant changes in some 
gut metabolic activities. The Canonical discriminant Analysis 
of Principal coordinates (CAP) of the fecal metabolic profiles 
showed a separation of subjects depending on synbiotic food 
intake. Recently, the GC-MS/SPME analysis of human feces has 
been also proposed as a tool to evaluate the in vitro effect of 
prebiotics and probiotics on the human microbiota (Vitali et al., 

2012) . 

Zheng et al. (2011) applied an un targeted GC-MS-based 
metabonomics approach to profile bacterial metabolites in nor- 
mal Wistar rats administrated with a broad spectrum P-lactam 
antibiotic imipenem/cilastatin sodium. In depth, metabolic phe- 
notyping allowed the identification of 202 urinary and 223 
fecal metabolites, many of which not previously reported (e.g., 
oligopeptides and carbohydrates), significantly related to a func- 
tional metagenome (Zheng et al., 2011). Moreover, Maccaferri 
et al. (2010) investigated the impact of rifaximin administra- 
tion on microbial metabolic profiles by using GC-MS/SPME. 
At the same time this technique was also used in combination 
with the : H-NMR technique to investigate the metabolome of 19 
celiac disease children under gluten-free diet (treated celiac dis- 
ease, T-CD) and 15 healthy children (HC). The metabolome of 



T-CD and HC children was studied using fecal and urine sam- 
ples. With this approach the authors showed that the levels of 
volatile organic compounds and free amino acids in fecal and/or 
urine samples were markedly due to affected by CD (Di Cagno 
et al, 2011). Francavilla et al. (2012) used GC-MS/SPME to study 
the influence of lactose on the composition of the gut micro- 
biota and metabolome of infants presenting with cow's milk 
allergy. 

Moreover, MS and HPLC techniques are commonly used for 
compounds' characterization at structural level. In the field of 
metabolomics, both MS and HPLC are often combined to char- 
acterize unknown endogenous or exogenous from a complex 
biological matrix. LC is probably the most versatile separation 
method, as it allows separation of compounds of a wide range 
of polarity with little effort in sample preparation (compared to 
GC-MS) (Moco et al., 2007). Indeed, large-scale metabolomic 
technologies based on LC-MS are increasingly gaining attention 
for their use in human disease diagnosis (Courant et al., 2009). 

In addition, GC-MS and LC-MS have been integrated to pro- 
vide the comprehensive metabolic signature of the malnutrition 
in rat model and to discover differential metabolites (Wu et al., 
2010). 

Recently, the combination of UPLC with MS has covered 
a large number of polar metabolites enlarging the number of 
detectable analytes. 

Like NMR, vibrational spectroscopies such as RAMAN and 
FTIR are comparable in sensibility, but the latter allows high- 
throughput screening and biological samples classification, and 
equally fits the "omics philosophy" of providing unbiased, whole- 
system measurements (Kell, 2004). In particular, the use of 
FTIR spectroscopy to monitor biochemical changes in living cells 
has gained considerable importance in the last decade. In fact, 
this technique presents the possibility to simultaneously iden- 
tify various cellular biochemical targets, both in vivo and in vitro 
conditions, exploiting the differential infrared radiation absorp- 
tion of each metabolites at specific wave number (Salman et al., 
2004). Molecules and systems of biological relevance that can 
be detected by FTIR spectroscopy including molecules such as 
lipids and fatty acids, proteins, peptides, carbohydrates, nucleic 
acids as well as biomembranes, animal tissues, microbial cells 
plants, and clinical samples (Dole et al., 2011). This technique 
has been employed to characterize isolated biological molecules, 
particularly proteins, and lipids. However, more recently it has 
been used, with the aid of sophisticated sampling techniques 
such as infrared imaging, in the diagnosis of many diseases 
such as cervical cancer, Parkinson's disease, Alzheimer's disease, 
kidney stone and arthritis (Dole et al., 2011). In fact, FTIR spec- 
troscopy can effectively provide chemical variation information 
about the structure and the composition of biological material at 
molecular level. Li et al. (2012) reported that colitis and colon 
cancer can be successfully identified and discern by analyzing 
colon biopsies through FTIR spectroscopy and chemometrics. 
Also, Argov et al. (2004) used FTIR microspectroscopy to dis- 
tinguish IBDs from colitis-associated colon carcinomas, when 
pathological symptoms are similar. In particular, the study of dif- 
ferences in specific regions of the infrared spectra allowed the 
identification of the discriminating molecules, e.g., the phosphate 
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content and RNA/DNA ratio, as different in IBD, cancer, or 
normal tissues. Furthermore, FTIR spectroscopy has also been 
used by Bright et al. (2011) to determine the efficacy of plasma 
homocysteine levels on vitamin supplementation in diet in hyper- 
lipidemic patients. Human plasma samples from healthy and 
chronic lymphocytic leukemia patients have been analyzed by 
FTIR spectroscopy by Erukhimovitch et al. (2006), leading to 
identify of specific spectral peaks as associated with biomarkers 
of the disease. Cluster analysis of the selected spectra provided 
excellent classifications correlated completely with clinical data, 
showing, not with standing. Although these results are prelim- 
inary, but promising, rapid, effective, and economic technique 
which can assist in the disease diagnosis. 

However, the technique selection depends on the kind of 
followed approach: (1) HPLC, GC-MS, LC-MS are employed 
in metabolite target analysis, e.g., determination and quan- 
tification of known metabolites or products of particular 
biochemical pathways; (2) HPLC-MS, CE-MS, LC-NMR, are 
used in metabolite profiling studies, e.g., larger, defined set 
of compounds survey and lipid analysis; (3) NMR, Direct 
Infusion electrospray Ionization-MS (DIMS), Laser Desorption 
Ionization-MS (LDI-MS), Matrix Suppressed LDI-MS (MSLDI- 
MS), FTIR, and Raman spectroscopy are used in metabolic 
fingerprinting, e.g., generation and comparison of sample 
metabolic profiles to identify differences (Shulaev, 2006) 
(Figure 2). 

Furthermore, the chromatography-MS systems are consid- 
ered the most favorable for the detection of large numbers 
of metabolites in metabolomics coupling the chromatographic 
metabolite separation with the sensitivity of MS detection. 
GC-Time of Flight (TOF)-MS, two dimensional GC coupled to 
TOF MS (GCxGC-TOF-MS), HPLC-MS, the analytically superior 
UPLC-MS and CE-MS have all been employed in mammalian 
metabolomic studies, either in a metabolic profiling or targeted 



analysis approach (Dunn et al., 2008). The GC-MS or GC- 
MS/SPME have been widely used in metabolomic and metabo- 
nomic studies, especially the second approach is useful because 
it does not require a difficult or strongly pre-analytical work 
flow. Garner et al. (2009) demonstrated the possibility to dis- 
criminate with GC-MS/SPME between the volatile organic com- 
pounds profile in fecal samples from preterm infants developing 
necrotizing enterocolitis (NEC) compared with non-NEC con- 
trols. Recent studies have suggested that the gut microbiota is 
involved in numerous important biochemical functions for the 
host, in healthy and pathological conditions (Del Chierico et al., 
2012). 

The extraction of valuable conclusions from the analysis of 
metabolomic data is important to perform the analytical mea- 
surements; in fact, there is a variety of methods that allow the 
instrument raw data transformation analyzed by the use of differ- 
ent software which provide a list of metabolites. Some parameters, 
such as biological variation present among individuals, sampling, 
sample preparation, and analytical measurement, influence the 
reproducibility of results, and these should be monitored as much 
as possible by measuring replicates, both analytical and biologi- 
cal. In principle, biological variance should surpass all analytical 
variance (Moco et al., 2007). Retention-time shifts are common 
in GC and more severely, LC, but only occasionally in NMR 
and FTIR spectra. In NMR spectra, non-reproducibility seems 
to be strictly related to sample preparation and hardly ever due 
to instrumental incoherence. Nevertheless, even in strictly con- 
trolled conditions, signal shifts may persist. For this reason, the 
use of signal-alignment software [e.g., MetAlign (De Vos et al., 
2007), XCMS (Smith et al, 2006), andMZmine (Katajamaa et al, 
2006)] has become a routine procedure for comparing chro- 
matograms or spectra in MS applications, while HiRes is suitable 
in NMR (Zhao et al., 2006), transforming raw data into workable 
informative datasets. 
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FIGURE 2 | Different approaches and respective techniques: pitfalls and strengths. The employed technique depends on the followed approach: targeted 
analysis, metabolite profiling, and metabolic fingerprinting. 
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HOW TO MANAGE AND INTEGRATE ALL DATASETS? 

It is difficult to correlate specific markers with risk for disease, 
nutritional state or diet choices. Instead, by examining the sys- 
tem as a whole and exploring multiple pathways simultaneously, 
a greater indication of a healthy or pathological state can be 
acquired. Hence, this new approach, characterized by a high- 
throughput determination of hundreds of metabolites, leads to 
a torrent of data, in fact, over the years, there have been constant 
upgrade in the hardware and the software of these technologies to 
meet the demands for robustness, practicality, applicability, and 
efficiency of the analyses. Therefore, powerful analytical strate- 
gies in combination with advanced multivariate statistical tools 
are required to have a look at this new "black box" and to 
extract maximum relevant information and knowledge about the 
complex biological system (Bictash et al., 2010). 

The use of single or combined analytical techniques or differ- 
ent kind of samples inevitably leads to face the big problem of 
how to manage and represent the thousands of data collected. 

One of the expectations of the system biologist is that these 
datasets can be integrated to give a holistic picture of the state of 
the system, e.g., development, ageing, health, or disease, which 
provides insights enable a more biology fundamental under- 
standing via unveiling network connections at molecular level 
(Richards etal., 2010). 

Appropriate experimental design, sample numbers and sta- 
tistical analyses are required to ensure generation of accurate 
and valid hypotheses and biological conclusions (Tseng and 
Wong, 2005). Powerful multivariate analyses, either supervised 
or unsupervised, are required to interrogate the data and define 



structure-related to biological system similarities or differences 
(Thalamuthu et al, 2006). 

The first step for simplification of all data, but not in an arbi- 
trary way, could be the use of Heat maps, where the relative 
intensity value of one peak is replaced by a small colored block 
(Zhou et al., 2011). Initially, it has been developed for microarray 
studies, but it is already used in some metabolomic works (Spitale 
et al., 2012). Rajaram and Oono (2010) proposed a new analysis, 
the so-called NeatMap as an alternative to the traditional Heat 
map, this offers a variety of novel plots (in 2 and 3 dimensions) to 
be used in conjunction with principal component analysis (PCA) 
and multi-dimensional scaling (MDS). Although, the NeatMap 
has been used so far only in genomics, it could be exploited in 
metabolomics as well. The second step could be the use of mul- 
tivariate statistical analysis methods that can interpret different 
datasets. According to Richards et al. (2010) there are three pos- 
sible integration strategies: (1) inter-omic, or the integration of 
data obtained from different -omic platforms (metabonomics, 
genomics, proteomics, and transcriptomics); (2) inter-platform, 
or the integration of data from different spectroscopic platforms 
(NMR, GC-MS, LC-MS etc.,); (3) inter-samples, or the integra- 
tion of data obtained from different human samples (plasma, 
urine, tissue, and faeces) (Figure 3). 

The most diffused multivariate methods are the PCA and the 
Partial Least Squares (PLS) (Holmes et al., 2011). In particular, 
PCA is widely used for multivariate of NMR or GC-MS analysis 
profiling data, it is based on modeling the natural variance within 
a dataset, which held to identify the underlying metabolite vari- 
ables that contribute to that variance (Biais et al., 2009). PLS and 



INTER-OMICS 




Managment of extracted dataset 

I approach Visive summarizing approach: Heat map and NeatMap 

II approach Multivariate analysis: PCA. CCA. PLS. OPLS. HPCA. HPLS, CPCA 

III approach Genetic or neuronal network 

FIGURE 3 | Schematic representation of statistical data integration methods in the area of inter-omic, inter-platform, and inter-sample integration. 

The management of different dataset derived from the metabolomic approaches are integrated by the most diffuse multivariate methods. 
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its derivative orthogonal PLS (OPLS) have been also used for 
NMR profiling data. In comparison with PLS, OPLS produces 
models which are more clear and therefore easier to interpret, 
leading to a better class-resolution in a discriminant problem 
(Stella et al., 2006). However, when PCA and PLS are used for 
the interpretation of several different, but potentially connected, 
datasets (called "blocks"), the loading plots are usually complex 
due to the co-variation in the spectrum, and therefore difficult to 
correlate to the corresponding score plot (Janne et al., 2001). For 
this reason a multiblock technique must be used. The common 
trend of the different blocks is revealed in the "super scores" plot, 
where the distribution of the samples of each individual block are 
shown in their respective "block scores," and similarly to classi- 
cal PCA, the contribution of variables to the trend shown in the 
blocks scores plot is shown in their "block loadings" plot (Biais 
etal.,2009). 

The hierarchical multiblock segmentation techniques (HPCA, 
HPLS) are based on new variables created from the original data 
by blocking the spectra into sub-spectra, and then projecting 
the sub-spectra by PCA. These new variables are then used in 
the coming PCA or PLS calculations, reducing the random and 
non-wanted signals from e.g., light scatter, but still conserving 
all systematic information in the signals. This technique gives 
the greatest advantage of easier interpretation of the correlation 
between scores and loadings (Janne et al., 2001). 

Another kind of multiblock PCA used is the Consensus PCA 
(CPCA), it can focus the data analysis on the relation between the 
specified metabolites and the remaining metabolites, searches for 
trends that explain as much as of the variation as possible (Biais 



et al., 2009). CPCA was introduced as a method for compar- 
ing several blocks of descriptor variables measured on the same 
objects and its difference whit respect to HPCA is in the data 
normalization (Westerhuis et al., 1998). 

Also, the multivariate analysis, such as the Canonical 
Correlation Analysis (CCA) and the regularized CCA are widely 
used to integrate datasets obtained from different source materials 
(Yamamoto et al., 2008). However, when the "normal" variance 
is greater than that explained by the process of interest, the 
most sophisticated genetic (Johnson et al., 2001) or neural net- 
work (Ahmed et al., 2009) based algorithms, may identify the 
underlying variables of significance which are not associated with 
the greatest variance (the "normal" condition). In these cases, 
such algorithm-based models should not be overfit (Biais et al., 
2009). 

As the holistic approach to biological systems comprehension 
continually evolves, the techniques used to analyze and integrate 
the datasets must be steadily updated to manage the problems 
occurring during techniques application. Hence, system biolo- 
gists have a key role in the inter-pretation of the meaning of the 
results obtained by the datasets management. 
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